This notebook demonstrates how to predict structures using the built-in structure_prediction package in pymatgen. We will be gathering all possible structures (via the Materials API) of the chemical systems containing the highest probability specie substitutions for our original species. We will then resubstitute the original species back into these structures, filter out duplicates as well as preexisting structures already on the Materials Project, and output the newly predicted structures.
Written using:
Author: Matthew McDermott (09/25/18)
In [1]:
# Imports we need for running structure prediction
from pymatgen.analysis.structure_prediction.substitutor import Substitutor
from pymatgen.analysis.structure_prediction.substitution_probability import SubstitutionPredictor
from pymatgen.analysis.structure_matcher import StructureMatcher, ElementComparator
from pymatgen.transformations.standard_transformations import AutoOxiStateDecorationTransformation
from pymatgen import Specie, Element
from pymatgen import MPRester
from pprint import pprint
In [2]:
# Establish rester for accessing Materials API
mpr = MPRester(api_key='#######') # INSERT YOUR OWN API KEY
Here we define two variables -- threshold for the threshold probability in making substitution/structure predictions, and num_subs for how many substitutions you wish to explore:
In [3]:
threshold = 0.001 #threshold for substitution/structure predictions
num_subs = 10 # number of highest probability substitutions you wish to see
In this section, we use the SubstitutionPredictor to predict likely specie substitutions using a data-mined approach from ICSD data. This does not yet calculate probable structures -- only which species are likely to substitute for the original species you input. The substitution prediction methodology is presented in: Hautier, G., Fischer, C., Ehrlacher, V., Jain, A., and Ceder, G. (2011) Data Mined Ionic Substitutions for the Discovery of New Compounds. Inorganic Chemistry, 50(2), 656-663. doi:10.1021/ic102031h
In [4]:
original_species = [Specie('Y',3), Specie('Mn',3), Specie('O',-2)] # List of original species for substituting into
Predict most common specie substitutions, sort by highest probability, and take the number of substitutions specified by num_subs:
In [5]:
subs = SubstitutionPredictor(threshold=threshold).list_prediction(original_species)
subs.sort(key = lambda x: x['probability'], reverse = True)
subs = subs[0:num_subs]
pprint(subs)
Create a new list of just the substituted specie combinations:
In [6]:
trial_subs = [list(sub['substitutions'].keys()) for sub in subs]
pprint(trial_subs)
Create a set of strings of each unique chemical system (elements separated by dashes):
In [7]:
elem_sys_list = [[specie.element for specie in sub] for sub in trial_subs]
chemsys_set = set()
for sys in elem_sys_list:
chemsys_set.add("-".join(map(str,sys)))
pprint(chemsys_set)
In [8]:
all_structs = {}
for chemsys in chemsys_set:
all_structs[chemsys] = mpr.get_structures(chemsys) # Getting all structures -- this can take a while!
In [9]:
auto_oxi = AutoOxiStateDecorationTransformation() # create object to determine oxidation states at each lattice site
Now create a new dictionary of all structures (with oxidation states) for each chemical system:
In [10]:
oxi_structs = {}
for chemsys in all_structs:
oxi_structs[chemsys] = []
for num, struct in enumerate(all_structs[chemsys]):
try:
oxi_structs[chemsys].append({'structure': auto_oxi.apply_transformation(struct),
'id': str(chemsys + "_" + str(num))})
except:
continue # if auto oxidation fails, try next structure
#pprint(oxi_structs)
In [11]:
sbr = Substitutor(threshold = threshold) # create a Substitutor object with structure prediction threshold
In [12]:
trans_structs = {}
for chemsys in oxi_structs:
trans_structs[chemsys] = sbr.pred_from_structures(original_species,oxi_structs[chemsys])
Filter duplicate structures using StructureMatcher:
In [13]:
sm = StructureMatcher(comparator=ElementComparator(),primitive_cell=False) # create object for structure matching
In [14]:
filtered_structs = {} # new filtered dictionary
seen_structs = [] # list of all seen structures, independent of chemical system
print("Number of entries BEFORE filtering: " + str(sum([len(sys) for sys in trans_structs.values()])))
for chemsys in trans_structs:
filtered_structs[chemsys] = []
for struct in trans_structs[chemsys]:
found = False
for struct2 in seen_structs:
if sm.fit(struct.final_structure, struct2.final_structure):
found = True
break
if not found:
filtered_structs[chemsys].append(struct)
seen_structs.append(struct)
print("Number of entries AFTER filtering: " + str(sum([len(sys) for sys in filtered_structs.values()])))
NOTE: The chemical systems to which the filtered structures are assigned might change when re-running the program. Since we are filtering for duplicates across chemical systems, either of the two systems may be reported in the filtered dictionary. Which of the two systems it is simply depends on the order in that the filter algorithm follows (and it's reading from a naturally unordered dictionary!)
Now we wish to run one more filter to remove all duplicate structures already accessible on the Materials Project.
In [15]:
known_structs = mpr.get_structures("Y-Mn-O") # get all known MP structures for original system
In [16]:
final_filtered_structs = {}
print("Number of entries BEFORE filtering against MP: " + str(sum([len(sys) for sys in filtered_structs.values()])))
for chemsys in filtered_structs:
final_filtered_structs[chemsys] = []
for struct in filtered_structs[chemsys]:
found = False
for struct2 in known_structs:
if sm.fit(struct.final_structure, struct2):
found = True
break
if not found:
final_filtered_structs[chemsys].append(struct)
print("Number of entries AFTER filtering against MP: " + str(sum([len(sys) for sys in final_filtered_structs.values()])))
pprint(final_filtered_structs)
Create final structure dictionary with StructureNL objects for each transformed structure (Note: this requires installation of pybtex):
In [17]:
final_structs = {}
for chemsys in final_filtered_structs:
final_structs[chemsys] = [struct.to_snl([{"name":"Matthew McDermott", "email":"N/A"}])
for struct in final_filtered_structs[chemsys]]
#pprint(final_structs['Y-Fe-O'][0].as_dict()) # Printing one of the StructureNL objects - this is a large dictionary!